Similarity Multidimensional Indexing

نویسندگان

  • Elena Mikhaylova
  • Boris Novikov
  • Anton Volokhov
چکیده

The multidimensional k-NN (k nearest neighbors) query problem arises in a large variety of database applications, including information retrieval, natural language processing, and data mining. To solve it efficiently, database needs an indexing structure supporting this kind of search. However, exact solution is hardly feasible in multidimensional space. In this paper we describe and analyze an indexing technique for approximate solution of k-NN problem. Construction of the indexing tree is based on clustering. Construction of hash indexing is based on s-stable distributions. Indices are implemented on top of high-performance industrial DBMS.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Indexing Issues in Supporting Similarity Searching

Indexing issues that arise in the support of similarity searching are presented. This includes a discussion of the curse of dimensionality, as well as multidimensional indexing, distance-based indexing, dimension reduction, and embedding methods.

متن کامل

Individual Study Option: Scalable Multimedia Database Indexing

Most image or video search engines operate similarity search by extracting and storing feature vectors from the multimedia objects. Thus, the similarity search is transformed into a search of points in the feature space that are close to a given query point in the high dimensional feature space. Multidimensional indexing structures are supposed to cut this process short and quickly return the m...

متن کامل

An Efficient Indexing Method for Box Queries in NDDS Spaces using BoND-tree

Similarity searches in multidimensional Non-ordered Discrete Data Spaces (NDDS) are becoming increasingly important for application areas such as bioinformatics, biometrics, data mining and E-commerce. Efficient similarity searches require robust indexing techniques. Box queries (or window queries) are a type of query which specifies a set of allowed values in each dimension. Unfortunately, exi...

متن کامل

Indexing Images with Multiple Regions

Abstract. Similarity indexing using Spatial Access Methods (SAMs) like e.g., R-trees, assumes that each data entity (or query) is represented by exactly one multidimensional point. However, for several applications, including indexing and retrieval of multimedia data like onedimensional signals and images, it is required that each data entity is represented by multiple points in a multidimensio...

متن کامل

On the effective clustering of multidimensional data sequences

In this paper, we investigate the problem of clustering multidimensional data sequences such as video streams. Each sequence is represented by a small number of hyper-rectangular clusters for subsequent indexing and similarity search processing. We present a linear clustering algorithm that guarantees the predefined level of clustering quality, and show its effectiveness via experiments on vari...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011